Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
نویسندگان
چکیده
Preconditioning and Regularization Enable Faster Reinforcement Learning Natural policy gradient (NPG) methods, in conjunction with entropy regularization to encourage exploration, are among the most popular optimization algorithms contemporary reinforcement learning. Despite empirical success, theoretical underpinnings for NPG methods remain severely limited. In “Fast Global Convergence of Policy Gradient Methods Entropy Regularization”, Cen, Cheng, Chen, Wei, Chi develop nonasymptotic convergence guarantees entropy-regularized under softmax parameterization, focusing on tabular discounted Markov decision processes. Assuming access exact evaluation, authors demonstrate that algorithm converges linearly at an astonishing rate is independent dimension state-action space. Moreover, provably stable vis-à-vis inexactness evaluation. Accommodating a wide range learning rates, this result highlights role preconditioning enabling fast convergence.
منابع مشابه
Global Convergence of Policy Gradient Methods for Linearized Control Problems
Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an “end-to-end” approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies. A notable drawback is th...
متن کاملGradient Convergence in Gradient methods with Errors
We consider the gradient method xt+1 = xt + γt(st + wt), where st is a descent direction of a function f : �n → � and wt is a deterministic or stochastic error. We assume that ∇f is Lipschitz continuous, that the stepsize γt diminishes to 0, and that st and wt satisfy standard conditions. We show that either f(xt) → −∞ or f(xt) converges to a finite value and ∇f(xt) → 0 (with probability 1 in t...
متن کاملFast global convergence of gradient methods for high-dimensional statistical recovery
Many statistical M -estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a high-dimensional framework that allows the data dimension d to grow with (and possibly exceed) the samp...
متن کاملGlobal Convergence of Conjugate Gradient Methods without Line Search
Global convergence results are derived for well-known conjugate gradient methods in which the line search step is replaced by a step whose length is determined by a formula. The results include the following cases: 1. The Fletcher-Reeves method, the Hestenes-Stiefel method, and the Dai-Yuan method applied to a strongly convex LC objective function; 2. The Polak-Ribière method and the Conjugate ...
متن کاملGlobal Convergence Properties of Conjugate Gradient Methods for Optimization
This paper explores the convergence of nonlinear conjugate gradient methods without restarts, and with practical line searches. The analysis covers two classes of methods that are globally convergent on smooth, nonconvex functions. Some properties of the Fletcher-Reeves method play an important role in the first family, whereas the second family shares an important property with the Polak-Ribir...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Operations Research
سال: 2022
ISSN: ['1526-5463', '0030-364X']
DOI: https://doi.org/10.1287/opre.2021.2151